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COLLISION HANDLING APT ARATVS AND METHOD 



FIELD OF THE INTENTION 

The present invention relates in general to execution of computer program 
5 instructions, and more specifically to thread-based speculative execution of 
computer program instructions out of program order. 

BACKGROUND OF TFIE INVENTION 

The performance of computer processors has been tremendously enhanced 
over tine years. This has been achieved both by means of making operations 

10 faster and by means of increasing the parallelism of the processors, i.e. the 

ability to execute several operations in parallel. Operations can for instance be 
made faster by means improving transistors to make them switch faster or 
optimizing the design to minimize the ievel or logic needed to implement a 
given function. I cchnjC|ucs for parallelism include processing computer 

15 program instructions concurrently in multiple: threads. There are programs that 
are designed to execute in several concurrent threads, but a program that is 
designed to execute in a single thread can also be executed in several 
concurrent threads. If the execution of a program in several concurrent 
threads causes program instructions to be executed in an order that differs 
from the program order in which the program was designed to execute die 
thread execution is speculative. The discussion hereinafter focuses on such 
speculative thread execution. 

A computer program thai ha- hc-.^i designed if. i>e executed in a single thread 
can be pnr:i!i-'ii'i- ! ir ii i i:-iv < ^r::::, :"i • ■ :; : r/jjhipie ;!:reii:i: and 
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However, if die threads access n shared memory, collisions between the 
concurrently executed threads may occur. A collision is a situation in which the 
threads access the shared memory in such a way that there is no guarantee that 
die semantics of die original single-threaded program is preserved. 

5 A collision may occur when wo concurrent threads access the same memory 
element in die shared memory. Ac example of a collision is when a first thread 
writes to a memory element and the same memory element has already been 
read by a second thread winch follows the first thread in the program flow of 
the single-threaded program. If the write operation performed by die first 

10 thread changes die data in the memory element, the second thread will read 

the wrong data, winch may give a result of program execution that differs from 
die result diat would have been obtained if die program had been executed in 
a single thread. Depending on die implementation, collisions can for example 
also occur when two threads write to die same memory element in the shared 

1 5 memory. 

Execution of a computer program in multiple concurrent threads is intended 
to speed up program execution, without altering die semantics of the program. 
It is therefore of interest to provide a mechanism for detecting collisions. 
When a collision has been detected one. or more threads can be rolled back m 

2 0 order to make sure that the semantics of the single: -threaded program is 
preserved. A. rollback involve? restarting a thread at. an earlier point in 
execution, and undoing everything thai has been done by the thread after thai 
point. In the example above, in which die older first thread wrote to a memory 
clem em thai already had been read by the younger second thread, the second 

1: thread ihouid be rolled hack, ai .iea;"t to tiv: pour, v-ivrn ti'K memory demerr 
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A hiown mechanism for Jrurmig unci handling, Collisions im ohv.s keeping 
ttach of accesses to memory dements by means of associating two or more 
flag bits per thread with each memory object. One of these flag bits is used to 
indicate that the memory object has been read by the thread, and another bit js 
5 used to indicated that the memory- object has been modified by the thread. 

The international patent application WO 00/70450 describes an example of 
such a known mechanism. Before a primary thread writing to a memory 
element in a shared memory, status information associated with the memory 
element is checked to see if a speculative, thread has read the memory element. 
10 If so, the speculative thread is caused to roll back so that the speculative thread 
can read the result of the write operation. 

A disadvantage of tins known mechanism when implemented m software is 
rha r if results in a \'.v<s? execution ~>^erhead clu*- to -he communication anci 
synchronization oe.rween tnc tnreaas Uui, i.- reuuireo ioj cil^u <h — ^ ^j- 

1 5 shared memory. The status information is accessible to several threads and a 
locking mechanism is therefore required in order to make sure that errors do 
not occur due r.o concurrent access to the same status information by two 
threads. There is also a need for memory barriers (also called memory fences) 
in order to ensure correct ordering between accesses to the shared memory 

1 and accesses to the status information. 

Another example of a known mechanism for detecting and handling collisions 
is described in Steffan J.G. et al, "The Potential for Using Thread-Level Data 
Speculation to Facilitate Amomatic ParaDelizanon \ Proceedings of the Fourth 
Interna ii> >nai Svmo emm Mi-'h-Perfornjaiic- Computer Architecture, 
r • : • ••"••..•;»*•• and : krc:v*::n >■>" 
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The flag bits are. according to this technique, associated with cache lines ki a 
first level cache of each of a plurality of processors. When a thread performs a 
write operation, a standard cache coherency protocol invalidates the affected 
cache line in the other processors. By extending the cache coherency protocol 
5 to include the thread number in the invalidation request the other processors 
can detect read after write dependence violations and perform rollbacks if 
necessary. A disadvantage of tins approach is that speculatively accessed cache 
lines have to be kept in the first level cache until the speculative thread has 
been committed, otherwise the extra information associated with each cache. 
10 line is lost. If the processor runs out of available positions in the first level 

cache during execution of die speculative thread, the speculative thread has to 
be rolled back. Another disadvantage is that the metiiod requires modifications 
to tire cache coherency protocol implemented in hardware, and cannot be 
implemented purely in software using standard microprocessor components. 

1 5 SUMMARY OF THE INVENTION 

As mentioned above the known mechanisms for handling and detecting 
collisions have some disadvantages. The problem solved by the present- 
invention is to provide mechanisms that simplify handling and detection of 
collisions. 

2 0 A first object of the present invention is to provide a device having simplified 
mechanisms for recording information regarding memory accesses to a shared 
memory. 

A second obiea of die present invention is to provide a simplified method tor 



i\ o ii.s-i:>4.,<m - I'f'i >r"i'ur-)i 

The objects of die present invention arc adiiev.:d by means of ar; appanina 
accordine to claim 1, by means of n method according to claim 17 and by means 
of a method according to claim 27. The objects of the invention are further 
achieved by means of computer program products according to claim 36 and 
5 claim 37. 

According to the present invention each of a plurality of threads are associated 
with a respective data structure for storing information regarding accesses to the 
memory elements of die shared memory. When a thread accesses a selected 
memory element in the shared memory, information is stored in its associated 
10 data structure, which information is indicative of die access to the selected 
memory element. According to an embodiment of die present invention 
collision detection is earned out after the thread has finished executing by means 
of comparing the data structure of the thread with the data structures of other 
unread? on which the thread may depend. 
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An advantage of the present invention is that each, thread is associated with a 
respective data structure that stores the information indicative of the accesses to 
die shared memory. This is especially advantageous in a software 
implementation since each thread will only modify die data structure with winch 
it is associated. The threads will read the data structures of other threads, but 
they will only write to their own associated data structure according to die 
present invention. The need for locking mechanisms is therefore reduced 
compared with die known solutions discussed above in which the information 
indicative of memory accesses were associated with die memory elements of die 
shared memory and were modified bv all the threads. The reduced need for 
• ); 1;:.-!-:;: ■:■<>■■ •!.■ ' -veriiead a or: make': 
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Another advantage of die present invention is that since it does not require a 
modified cache coherency protocol, it can be implemented purely in software, 
thus malting it possible to implement die invention using standard components. 

Further advantages of embodiments of the. present: invention will be apparent 
from die following detailed description of preferred embodiments with 
reference to accompanying drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Fi» 1 is a schematic block diagram of a computer system in which the present 
invention is used. 

1 0 Ficrs. 2A and 2B are schematic diagrams that illustrate a computer program being 
executed in a single thread and divided into several threads respectively. 
Fie;. 3A is schematic block diagram that illustrates how data structures according 
to the present invention are used. 

Fi?. 3B is schematic block diagram that illustrates how an alternative 
15 embodiment of dam structures according to the present invention is used. 

Fig. 4 is a flow diagram illustrating how reading from die shared memory may be 
performed according to the present invention. 

Fig. 5 is a flow diagram illustrating how writing to die shared memory may be 
performed according to the present: invention. 
20 Fi<> . 6 is a schematic block diagram thai: illustrate;, dependence lists associated 
with threads according to die present, invention. 

Fig. 7 l? a flow diagram iUusixaung how a thread may be executed and a collision 
cheer, for die thread mar be made according to the present invention. 

1 : DETAILED DZSCRIFTi OS OF SPECIF] C EMBODIMENTS 
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divided into a number . »f mcni. »rv element!-- ntiO. ml. :u2 run. The rru.:u»T) 
elements may for inscincc be equal to a cache Line or may alternatively 
correspond to a variable or an object in a source Language. Figure 1 also shows 
three threads 5, 6, 7 executing on the CPUs 2, 3. 

L A tiuead can be seen as a portion of computer program code that is defined by 
wo checkpoints, a start point and an end point. Figure 2a shows a schematic 
Illustration of a computer program 8 comprising a number of instructions or 
operations, LI, i2,...in. When the computer program is executed as a single 
thread, the normal way of processing die instructions is in the program order, 
3 0 i.e. from top to bottom in Figure 2A. Ii is however possible, according to known 
techniques as mentioned above, to divide the program into multiple threads. 
The program 8 may for instance be divided into die three threads 5, 6, I as 
indicated in Figure 2A. The threads can be executed concurrently. Figure 2B 
iiiustrar.es an example of a threaded program flow, wlicre me nrst -wi J c — rust 
j. .. processes trie thread 5 and tiien die. direacl 6, and the second CPL J ..ta~c. 
processing thread 7 before the threads 5 and 6 have finished executing on the 
first CPU 2. 

Figure 2B shows an example of bow the threads 5, 6, 7 may execute. Many other 
alternative ways of executing the threads are however possible. It is for instance 
not necessary that the first CPU 2 finishes processing the thread 5 before 
starting on the thread 6 and die thread 6 may be executed before die thread 5. 
The firs i CPU 2 may be a type of processor that is able to switch between 
several different threads such that the CPU 2 e.g. starts processing the thread 5, 
leave?, the rhread r > before ii is finished to process the thread 6 and then returns 
;.. ;: K thread :" .ran. i ■ >;-,nn •..!•.• .•-:>••:- r I Such a or cessoi is >omcr:r.ie: 
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Thus, ir is not necessary to have multiple CPUs in Older to process multiple 
threads concurrently. 

Collisions may occur between the threads 5, 6 r 7 when the instructions of the 
computer program 8 are executed out of program order. As mentioned above, a 
5 collision is a situation in winch the threads access the shared memory 4 in such 
a way that there is no guarantee that die semantics of the original single- 
threaded program 8 is preserved. It is therefore of interest to provide 
mechanisms for detecting and handling collisions that may anse during 
speculative thread execution. 

10 According to die present invention each thread 5. 6, 7 is associated -with a data 
structure 9, 10, 11, which is illustrated schematically in Figure 1. The data 
structure is used to store information indicative of winch memory elements in 
die shared memory 4 that the respective thread has accessed. According to an 
embodiment of the present invention each data structure includes a number of 

15 bits 12 that correspond to the memory elements in the shared memory. 
According to the embodiment of the present invention shown in Figure 1 die 
bits 12 of each data structure 9. 10, 11 are divided into a load vector 9 a, 10a, Ha 
and a store vector 9b ? 10b, lib, For each memory element mO, ml, m2 3 mn in 
the shared memory 4. there is exactly one corresponding bit 12 in the load 

2 0 vector and exactly one corresponding bit 12 in the store vector associated with 
each thread. When the thread 6 reads horn a memory element, it set? die 
corresponding bit 12 id die load vector 9a to indicate dial: the memory element 
has been read. The store vector 9b is updated analogously when the thread 6 
v/rite; r.o the shared memory. 

•_. There can eimer be :•■ one- -co- one r:;>rre:Toa:ie:j:e or ? ;r:any-i r '-one 
'rrr ■o:»n;ie;j.:/: oervecr. t.\y: mem - . c r eircr;::,;;' aoc mc :">:v ti:*; .«. :ac a:.-:. 
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Reduce rJu- memory overhead will however also result in rrdircd ey.r.cuii. ,•; 
overhead, since chert- will he fewer cache misses. A hash function can he used to 
map a number of a memory element Co a hi I position in die load and store 
vectors. 

5 Figure 3A illustrates an example of how die data structures 9, 10, 11 are used 
according to the present invention. In dns example die diread 5 has written to 
the memory elements ml and m4 and read memory elements ml, m5 and m8. 
The thread 6 has written to the memory elements m2, m6 and m9 and read die 
memory elements m2, m6 and ml 3. The diread 7 has read the memory element. 
10 ml2. In this example, there are more memory elements in tine shared memory 
than there are bit positions in the load and store vectors, winch means that thete 
is a many-to-one correspondence between the memory elements and die bits m 
die load and store vectors. In this example the bit position in die load and store 
sector that corresponds to a selected memory dement is round using « Hasn 

1 u , function, which in dais example simply calculates tne remainder wnc:: Jiwccuig 

the number of die memory element by the size: of the load and store vectors. 
Tins means that when die diread 5 wares to die memory elements ml, it sets the 
bit m position number 1 in its store vector and when die diread 6 wnt.es to die 
memory element m9. it sets die bit in position number 1 in its store vector. 

2 0 Mien the threads have performed the write and read operations mentioned 

above the bit position numbers that are set. will be 0 5 t 5 for die load vector 9a; 
1, 4 for die store vector 9b; 2, 5, 6 for die load vector 10a; 1, 2, 6 for the store 
vector 10b and 4 for die load vector 11a. Tins is illustrated in Figure 3.4 by 
means of filled boxes representing the bits that are sec. 

;m;.-.T.,ei.: ; r.r.r, • -..-i ■ :•:!,- limphfie-: me;:;. ' ■■■■ 
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with the only difference that the data structures 9, 10, 11 each includes a single 
combined load and store vector 9c. 1 Oc, 'J 1 c instead of the load vectors 9a, 'J 0a 5 
11a and die store vectors 9b ; 10b, 'Jib. The bit positions that are set in the 
combined load and store vector 9c correspond to a logical bitwise inclusive or 
5 operation of the load vector 9a and store vectors 9b shown in Figure 3B. 

The embodiment of die present invention wherein die data structures includes a 
sinole combined load and store vector results in an increased number ot 
spurious collisions, but on the other hand it also results in a reduced need for 
memory to store the data structures and a reduced number of operations when 
10 checking for collisions, as will be discussed further below. 

The embodiments of die present invention shown in Figures 3A and 3B uses a 
type of data versiomng called privatisation, winch means that a private copy 14 
of a memory element that is to be modified is created for the thread that 
modifies the element. The thread then modifies tire private copy instead of the 

15 original memory element in the shared memory. The private copies contain 
pointers 15 to their corresponding original memory element in die shared 
memory. The private copies are used to wnte over die original memory elements 
in die shared memory 4 when die threads for winch they were created are 
committed. If a" diread is rolled back, its associated private copies 14 are 

20 discarded. Figure A shows a flow diagram illustrating how reading from the 
shared memorv is .performed when privatisation is used. Figure 5 shows a 
corresponding flow diaeram for writing to die shared memorv. 

Fig'ute A shows a firsi step 20. wherein die memory element to be rend if marked 
as read in die load terror. In step -F h .u examine:, wberner or no: ti'jt- mteaa 
' : ' Far ' *'.-ri" r fit'. ;* r;y *<: dr; m-mor" eicm-::!: v; \k .::e:.tc. : prr^M" e,u"t: ::o c 
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Figure :"• <li.w.-s a first :;rq> 25, wherein ir is examined wlie.iher <»r no! rhr :hrr..'id 
has a private cyn of die memory element to be vmtreu to. if there is no private 
copy, the iriemorv elcmeni to be written to is marked as written in die store 
vector, step 26, and a private copy is created, step 27. The data is then written to 
the private, copy, step 28. If a private copy is found to exist in step 25, die data 
can be written to the private copy directly, step 28, without having to make a 
mark in the. store vector or create die private copy. 

The privatisation described above is not a prerequisite of die present invention. 
Another type of data versiomng, which may be used instead of privatisation, 
involves that the threads store backup copies of the memory elements before 
they modify diem. These backup copies are rhen copied back to the shared 
memory during a rollback. 



The embodiments of the present indention described abo^e corn-rise data 

thread's accesses to die memory. However, many alternative types of data 
structures for storing this information are possible according to die present 
invention. The data structures may for instance be implemented as lists to which 
numbers diat correspond to die memory elements are added to indicate accesses 
die memory elements. Other possible implementations of die data structures 
include trees, hash rabies and other representations of sets. 

It will now be .discussed how die diread associated data structures ol the present 
invention can be used to check for and detect collisions. 

In u software implementation where the thread associated data structure:-- oi the 
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Tins sending of messages takes time and causes an extra delay, which can be 
avoided by means of the present invention. 

According to a preferred embodiment of the present invention collision checks 
are performed after tire thread has finished its execution and is about to be 
5 committed. The collision check is made by means of comparing die data 
structure, associated with die thread to be checked with the data structures 
associated with other threads on which die thread to be checked may depend. In 
order to keep track of tire possible dependencies between threads a dependence 
list may be created for each thread before it starts executing. Tins is illustrated in 
10 Figure 6, by means of tire threads 5, 6, 7 which are associated with dependence 
ksts 16, 17 and IS respectively. The dependence lists are lasts of all older threads 
that had not yet been committed when tire thread was about to start executing. 
The thread 7 may depend on threads 5 and 6 so its dependence list 18 contains 
references to threads 5 and 6 to indicate tire possible dependency. 

15 The dependence list described above is just an example of how to keep track of 
possible dependencies between threads. The dependence list is not limited to a 
list structure but can also be represented as an alternative structure mat can store 
information regarding possible dependencies. It is further not necessary for tire 
dependence list to store a reference to all older not yet committed threads. For 

20 example in an implementation where forwarding is used it may be possible to 
determine that the thread to be started is not dependent on some of tire older 
not yei committed threads and it is then nor necessary to store a reference to 
these thread- in the dependence list. In other cases hit mformauon stored m the 
dependence list mar refer to an interna) of threads of which some already have 
beer committee: when the dependence lis: i: created. A: ione a: the dependence 
lire ineiade: ; .'•■.-jcrrr.'ce v. - aP ere tirr.ad: chat th; thi-aa: c be .-cart.ee aepeci::: 



Figure - shows a vW diagram ■ .! h^v :i thread iriay he carrmed and a >!*)i.n-n 
check for die thread may be made according to die present invention. In a aen 
30, die. dependence list for die thread to be executed is created. The thread is 
then executed in a step 31. When the tead has finished executing, it waits until 
the threads that it may depend on have been checked for collisions and are ready- 
to be committed., step 32. It then compares its associated data structure to die 
data structures associated with the threads in die dependence list to check for 
collisions, step 33. If no collision is detected, die thread is committed in a step 
34, otherwise die thread is rolled back in a step 35. If the thread has collided 
with another thread, the risk that the unread collides with die same thread again 
may be reduced by means of delaying die restart of die thread until die thread it 
collided with has been committed. The system may be arranged to give higher 
priority to comrmtting direads with winch other dueads have collided. 

When the collision check is performed as described above, even trie oiciest no: 
yet committed tin read is speculative, since u niUJii nave collided witi. an e.c.:ie 
thread that already has been committed and this is not detected until the thread 
has finished its execution. However, when a thread has become the oldest nor 
yet committed thread, it will have to be rolled back at die most once, since when 
it is restarted, tiiere is no other thread that it can collide with. 

Alternatively one or several partial collision checks may be performed during 
execution, before performing the collision check when die thread has finished 
executing. The partial collision check can be performed without locking the data 
structures associated with other threads because it is acceptable that the partial 
check fails t v > dc-trec some ollisi'in:, Collision-; thai were not detected in the 
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load and store vectors or a combined load and store vector. If the data 
structures have separated load and store vectors die comparison between die 
load and store vectors of an older and a younger thread can be carried out by 
means of performing die foUowrng logical operations bitwise on die bit vectors: 

5 old store vector AND (young store vector OR young load vector). 

If die resulting vector contains any bits diat are set there is a collision and die 
younger thread should be rolled back. If die data structures have combined load 
and store vectors die corresponding logical operation to be performed to check 
for collisions is an AND-operation between die combined vector of die older 
1 0 thread and die combined vector of die younger thread. 

In an alternative embodiment die comparison to detect collisions is earned out 
by means of performing die following logical operation bitwise on die bit 
vectors: 

old store vector AND young load vector. 

15 Tins comparison assumes diat: die direads are committed in program order and 
that when a write operation that only modifies part of a memory element (which 
corresponds to a read-modify-write operation) is carried out the corresponding 
bit in both die load and die store vector is set. 

An advantage of die collision died: of die present invention is that since 
collisions do not have to be detected until die tiiread has finished executing., 
there is no need for any loddng mechanism or memory barriers during 
execution. Thi: reduce-: rht execution overhead and makes the iiiipiementau' >rj 
simpler. Ancdv-r reason v.ir" die execution overhead can be- reduced ac:ordi:i'.: 
r- 'j'j'i ~;y;~y~-y. jc v e:;ii:i" y t:.;>.: y \ir: "jl'M"* cir--rr J: :f:::'.-:::Tj::c vm;.:: 
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during, cr.ccutinn. In the i.invn mechanisms discussed u collision rbvi; 

was performed in cr.nnecr.ion wrti. each access to the sluiced memory. 

The cost of handling collisions according Co die present invention is that 
collisions are not detected as early as possible, which results m some wasted data 
5 processing of threads that already have collided and should he rolled back. 
However, the gain in execution overhead will in many cases surpass die cost of 
not detecting collisions immediately. The collision check of the present 
invention described above is thus particularly favorable when collisions are. rare. 

According to the present invention, die only thing that has to be. performed m 
10 the same order as in die original single-threaded program is die collision check. 
Threads can be executed and rolled back out of program order and depending 
on the implementation sometimes also committed out of program order. 

If the many-to-one correspondence between die memory elements ana me :ii"e 

,ii i , . ...„ r ,j -t,,-, n.-.d r-r.-, <-.--> iiorr.-n-c V>ow- ;i r',"<ed 

in trie iu2ia auu slOj.c I'ttiuu ^ ^o^.^., u.-- j.^^^ . , -« 

15 size. The memory overhead is then proportional to die number of threads 
instead of the number- of memory elements, winch means that die amount of 
memory needed to store the data structures will remain die same when die 
number of memory elements in die shared memory increases. 

The present invention can be implemented both in hardware and in software. In 
20 a hardware implementation it is possible to use a fast fixed-size memory inside 
each processor to store die data structures. In a software implementation a 
speed advantage wall be obtained if the data structures are made small enough to 
be stored in die first level cache of die processor. Due to die frequent use oi die 
dan. .-.:m.-. irv ii vil. m a.r :<:r:i-.- n : • --'re diem in as :V.tf rmsrrc- . 
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depend on the thread are committed. Once the tiiread and all threads diat may 
depend on it are committed die memory used to store its associated data 
structure can be reused. 

The present invention is not limited to an}' particular type of memory elements 
5 of a shared memory. The present invention is applicable to both logical and 
physical memory elements. Logical memory elements are for example variables, 
vectors, structures and objects in an object oriented language. Physical memory 
elements are for example bytes, words, cache lines, memory pages and memory 
segments. 

0 As described above a thread comprises a number of program instructions. Other 
terms for a senes of instructions that are sometimes used in die field. An 
example of such a term is job. 

Thread-level speculative execution with a shared memory has many similarities 
to a database transaction system. The entries of a database can be compared 
5 with die elements of a shared memory and since a database transaction includes 
a number of operations, a database transaction can be compared with a thread. 
One way to ensure that a database remains consistent is to check for collisions 
between different database transactions. Thus the principles of die ideas of die 
present invention may be used also in this field. 

0 It is to be understood that the embodiments of die present invention discussed 
above and illustrated in the figures, merely serves as examples to illustrate the 
ideas of the present invention and thai the invention in no way is limited to just 
die example:- described. The examples are for instance simple example;- tfou 
ordv illustrate :. few memory element? in thi shared memorr and :, few in 
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CLAIMS 

I . Ad apparatus thai supports execution of computet program instructions 
speculatively out of program order comprising: 

- a plurality of threads for executing computer program instructions, and 

- a shared memory, winch comprises a number of memory elements accessible 

to die plurality of threads; 
wherein each of die threads are associated with a data structure for storing 
information regarding accesses to the memory elements of the shared memory 
and wherein each of die threads has means for accessing a selected memory 
element in the shared memory and means for storing information in the 
associated data structure indicative of die access to die selected memory 
element. 

2 The annar-atus according to claim 1, wherein the- data structures are one ot rhe 
' following types ot structures: an unsorreci iisr a sorted list, a u:ee and a tame. 

3. The apparatus according to claim 1. wherein each data structure comprises a 
number of bits that correspond to die memory elements of die shared memory 
and wherein the means for storing information are means for setting at least one 
chosen bit; which ai least one chosen bit corresponds to the selected memory 
element. 

4. The apparatus according to claim 3, wherein die data structure comprises a 
had vector and a store vector, wherein the mean? for setting at least one chosen 
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5. The apparatus according to claim 3, wherein the dam structure comprises a 
sinde combined load and store vector. 

6. The appaiatus according to claim 4 01 5, wherein there is a one-to-one 
5 correspondence between the memory elements in the shared memory and the 

bits in die or each, vector of the data structure. 

7. The apparatus according to claim 4 or 5, wherein there is a many-to-one 
correspondence between die memory elements in die shared memory and die 

1 0 bits in die or each vector of the data structure. 

8. The apparatus according to claim 7, wherein die correspondence between die 
bits in die or each vector and die memory elements is determined by a hash 
function that maps die memory elements to the bits in the or each vector. 

15 

9. The apparatus according to any of claims 1-8. wherein die apparatus further 
comprises means for checking whether a thread has a private copy or die 
selected memory object, means for creating a private copy of the selected 
memory object and means for reading and writing to a private copy or die 

20 selected memory object. 

10. The apparatus according to any of claims 1-8. wherein die apparatus further 
comprises means for storing a backup copy of die selected memory element. 

If 'j'i. The apparatus according to any of claim- 3 -JO. wherein the apparatus 
nrtiie:: comprise:" mean; for cbeckin'.:,. when a firs- thread has mi roe a 
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cliecldny, comprises mean f- >v ■■< .moani'iy. die data stn sure associated with rlic 
fixst thread with each vesper rive data snuc.ture associated with the threads on 
which i:he Erst thread may depend. 

12. The apparatus according to claim 11, wherein die apparatus farther 
comprises means for creating a dependence list associated with the first thread 
before execution of die first thread, which dependence list includes a reference 
to each thread which has not yet. been committed and which comes before die 
first thread in program order. 

13. The apparatus according to claim 11 or 12, wherein the apparatus further 
comprises means for committing die first thread if no collision is detected 
between the first thread and any of the threads on winch die first thread may 

deoerid and means for -e.s-arr.ine erar.urior. of the first thread if a collision is 

, t - - ... - .... ..u. i„ .,_ -.u;^u -t,,, ■ 

detected oetween cite iirsi tineao ''■<>^ o. « j » j.j • 

thread may depend. 

14. The apparatus according to claim. 13, wherein die apparatus further 
comprises means for delaying a restart of execution of die first thread until the 
thread or each of the threads with which the Erst thread has collided has been 
committed. 

15. The apparatus according to claim 14, wherein the apparatus further 
comprises means for enmo nriontv to committing and /or executing; the thread 
■ each of the threads with which the Erst thread bar collided. 
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depend, which means for performing a partial check comprises means for 
comparing die data structure associated with die first thread with die respective 
data structure associated with die at least one of die threads on winch die first 
thread may depend. 

o 

'17. A method for recording information regarding accesses to a shared 
memory, winch shared memory is accessible to a plurality of threads that are 
arranged to execute computer program instructions speculatively out of program 
order, winch metiiod includes die steps of: 
10 - a first of die plurality of threads accessing a selected memory element in die 
shared memory, and 

- die first tiiread storing information indicative of die access to die selected 
memory element in a data structure associated with die first tiiread. 

15 18. The method according to claim 17, wherein die data structure is one of die 
following types of structures: an unsortec list., a sorted list, a tree and a table. 

19. The metiiod according to claim 17, wherein each data structure comprises a 
number of bits that correspond to die memory elements, of the shared memory 

2C and wherein, die step of storing information comprises setting a chosen bit m die 
data structure, which chosen bit corresponds to die selected memory element. 

20. The method according to claim 19, wherein the data structure comprises a 
load vector and a store vector, wherein die chosen bit is a bit in the load vector 

I: if die first tiiread accesses the selected memory elemen: in order to reac it. and 



» * 

21. "J Ik* method according M churn wherein die J;k;j .;ir-i« ::*.in compn^ a 
single combined IojicJ and store vector. 

22. 'The method according to chum 20 or 21, wherein there is a one-to-one 
5 correspondence between the memory elements in the shared memory and the 

bits in the or each vector of the data structure. 

23. The method according to claim 20 or 21, wherein there is a many-to-one 
correspondence between die memory elements in the shared memory and the 

10 bits in the or each vector of die data structure. 

24. The method according to claim 23, wherein the correspondence between the 
bits in the or each vector and the memory elements is determined by means of 
riaooincr the memorv elements to the bits in the each ^ec+or usinp- a hash 

25. The method according to any of claims 1 7-24. comprising the further steps 
of: 

- die first thread checking whether it has a private copy of die selected memory 
2 0 object; 

- if the first thread has a private copy and the first thread accesses the selected 

memory element in order to read it the first thread reading from the private 
copy; 

- if the first thread does not have a private cony and the firs- dircad accesses 
: y\\t xdected memnrv elemen; ir. order r'. reric it, the first thread reading ::rorn 
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- if the first thread dees nor. have a pnrare copy and the first thread accesses 
die selected memory element in order to write to it die first thread creating a 
private copy of the selected memory and writing to the. private copy. 

26. The method according to any of claims 17-24, comprising die further steps 
of, if die first thread accesses the selected memory element in order to write to 
it, die first thread storing a backup copy of die selecred memory element and die 
first, thread wilting to the selected memory element in the shared memory after 
die backup copy is stored. 



27. A method for handling possible collisions between a plurality of threads, 
which threads are arranged to execute computer program instructions 
speculatively out of program order and to access memory elements of a shared 
memory, winch method includes die steps of: 

15 executing a first thread: 

rheckmo. when the first thread has finished execution, if each of the threads on 
which die first thread may depend is ready to be committed: 
waiting until each of die threads on winch die first thread may depend is ready 
to be committed, if each of the threads on winch tile first thread may depend is 

2 0 not ready to be: committed: and 

checking for collision between the first thread and each of the threads on which 
the firs-, thread mar- depend by mean:: of comparing a data structure associated 
with the first thread with a data, structure associated with the thread on winch 
the first diread may depend, which data structures store? information regarding 

I: which of the memory elements die thread with which the data structure is 
associatec has accessed curiae executor of tin threat:. 



unci wherein ;i is set if die marx/n eieiriuji h- which ihe bil : .nr.-|i'i:ui:*. Imi 
been accessed by die thread with which the dura structure is associaied during 
execution of die thread. 

5 29. The method according to claim 28, wherein each data structure comprises a 
load vector and a store vector, wherein a bit: in die load vector is set if the 
memory object to which die bit corresponds has been read by die diread with 
which die data structure is associated during execution of die diread and 
wherein a bit in the store vector is set if die memory object to which die bit 

10 corresponds has been written to by the thread with which die data structure is 
associated during execution of the thread. 

30. The niediod according to claim 28, wherein each data structure comprises a 
sincrle combined had and store vector. 

31. The mediod according to any of claims 27-30, wherein die method further 
comprises die step of creating a dependence list associated with die first diread 
before execution of the first diread, which dependence list includes a reference 
to each diread winch has not yet been committed and which comes before die 

I-;, first thread in orooram order. 

32. The method according to any of claims 27-31. wherein die first thread is 
committed if no collision is detected and wherein die execution of die nrst 
thread is restarted if a collision is detected. 
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34. The method according to claim 33, wherein priority is given to committing 
and/or executing the thread or each of die threads with which the first thread 
collided. 

35. The method according to any of claims 27-34. comprising die furtiier step of 
performing a partial check for collisions between the first thread and at least one 
of the threads on which die first diread may depend by means of comparing the 
data structure associated wkh die first diread with die respective data structure 
associated witii die at least one of die threads on winch die first diread may 
depend, wherein no locking of die data structures take place while die partial 
check is performed. 

36. A computer program product comprising computer code means for 
performing die method of any of claims 17-26 when run on a computer. 

37. A computer program product comprising computer code means for 
performing die metiiod of any of claims 27-35 when run on a computer. 
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